WEEK 1- INTRO AND
WHAT IS A
STATISTICAL MODEL
SDS 290
Scott LaCombe
Todays Plan
Introductions
Course overview
CITI training set up
A Little About Me
J O INT G O V / S D S P O S I T I O N
G R A D U AT E F R O M U N I V E R S I TY O F I O WA
O R I G I N A L LY F R O M KA N S A S C I TY, M O
I F O C U S O N STAT E P O L I T I C S A N D N E T W O R K S O F
P U B L I C P O L I C I E S
Teaching Assistant: Gollum LaCombe
Introductions (in small groups)
Name
Major
Year at Smith
Most recent show/movie you’ve been obsessed with
Goals of this course
Understand fundamentals of experimentation and observational research
Learn about how to design and implement survey experiments
Implementing ANOVAs and similar models
Use software and data to answer real world questions about the world around us
Quick note on R and Stats Background
Assumption- you’ve taken an introductory stats class
Demonstrate familiarity with descriptive statistics, normal/t distribution, hypothesis testing,
p-values, and confidence intervals
We will be using R extensively in this class
Will start slow, but quickly build
If you are unfamiliar with R, I strongly suggest working through first 4 chapters of
ModernDive (see syllabus)
Talk with me, go to stats Tas
SDS 100
Tips for Learning this semester
Office hours:
Mondays: 2-3, Wednesdays 11-12, Thursdays 4:15-5:15
Complete readings before class
Use office hours and tutors
Post on slack
If you have a question, someone else probably does too
Also counts toward participation
Keep me in the loop if you are struggling inside/outside class
Much easier to give extensions before due date than after
Slack Chanel & Moodle
A note on course Delivery and Participation
Will record lectures, no remote option
Welcome to “zoom-in” classmate
Participation and attendance contribute to course participation grade
If you can’t make it to class, email me and post something on slack.
In person R labs are critical to your learning
I’m trying to be as flexible as possible, extending the same to you
In person attendance expected, but if you are feeling sick, close exposure, watch recording and get notes from a friend
Syllabus
Walkthrough
Basic Structure of Course
Lecture with periodic group discussion/prompts
Weekly(ish) homework assignments, due Fridays at 11:59 PM
Periodic R workshops to build programming skills
2 exams
2 mini projects- will talk about more next week
1 mini project solo
1 in groups of 3
Design and implement survey
QUESTIONS?
Before we get started with content…
By tomorrow- Fill out introductory survey
For next Friday
CITI training
Intro to R lab
Don’t put off! Citi training takes a bit of time
WHAT IS A
STATISTICAL MODEL?
Problem we face today- so much data!
Our goal
With so much info, how to separate out the signal from the noise?
Our approach- Modeling
Simplification of complex processes to use data to better understand the
world around us
All models are wrong, some are better than others
World is complex, shouldn’t forget that
Uncertainty is central
Goal of modeling
Prediction
Classification
Evaluating a treatment
Testing a theory
Summarizing a pattern
Improving a process
Making a decision
What should our goal be?- small groups
Model basics
Y=model+error
Y
Dependent variable, response variable, outcome
Thing we are trying to explain/model
Model
Explanatory variables, independent variables
Error
Residuals, difference between predicted and observed
Other important terms
Sample vs Population
Statistic vs Parameter
Inference
Parameter estimate
Causal inference
Role of experimentation vs observation
Covariates
Data
Cases
Unit of analysis
Variables
Types of variables
Quantitative/continuous
Categorical
Ordinal vs nominal
binary
What types of variables are these?
Race
Education level
Income (in dollars)
left-handed/non-left handed
Voter turnout
Letter grade in a course
Modeling Process- 4 steps
Choose a form for the model
Fit the model
Assess the Model
Address research question
Theory comes first
Our plan
Anova- Analysis of Variance
Response variable Quantitative
Explanatory variable typically categorical
Fundamentals of experimentation
Later in semester- Causal inference with observational data